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practical applications. In this article, a fuzzy decision tree has been 
introduced that tackles the problem of tree complexity and memory 
ee limitation by incrementally inserting data sets into the tree. Membership 
Fuzzy partitioning functions are generated automatically. Then fuzzy information gain is used 
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1. INTRODUCTION 

Data mining is a motivating field of research in several disciplines including artificial intelligence, 
databases, statistics, visualization and high performance and parallel computing. the knowledge obtained by 
data processing is used for several applications including customer retention, marketing research, science, 
exploration and fraud detection to production control, health system, education, security assessment, and road 
traffic prediction and control [1]-[4]. The classification in data mining may be described as a supervised 
manner that given a training dataset with associated training labels, determine the suitable class labels for an 
unlabeled test instance. the decision tree (DT) may be a popular classification method that constructs a 
classification model within the form of a tree structure [5]. DT is a structure similar to a flowchart, in which 
each internal node represents a test of an attribute, each branch represents the result of the attribute test, and 
each leaf node represents a class label. The lead of the root node is zero, which means it has no leading edge. 
The tree implements classification by dividing the branches of the tree, where each division represents a test 
of the data attributes. This branch splitting continues to the last level, called the terminal level, where all the 
data tuples in a node involve samples of one class [6]. There are a variety of statistical algorithms that can be 
used to build decision trees, including ID3, classification and regression trees (CART), C4.5, chi-squared 
automatic interaction detection (CHAID), and quick, unbiased, efficient, statistical tree (QUEST) [7], [8]. 
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One of the most challenging problems in the decision tree is developing scalable algorithms that can 
process large data sets whose size exceeds the memory capacity [9]. The size of decision trees tends to be 
dependent on the size of training data, and conventional decision tree building approaches are inefficient. 
Several algorithms have been developed to construct DTs from large data sets. Sampling, partitioning, 
distributed, parallel processing, and incremental methods are some of the basic techniques for processing 
large data [9]. For example, supervised learning in quest (SLIQ) [10], RainForest [11], classification for large 
or out-of-core datasets (CLOUDS) [12], bootstrapped optimistic algorithm for tree (BOAT) [13], very fast 
decision tree (VFDT) [14], decision tree using fast splitting attribute selection (DTFS) [15] trying to solve the 
same problems. Some of these algorithms store the entire data set in memory, while others only store part of 
the training data, but selecting a subset of the data is time-consuming and computationally expensive. 
Besides, in these methods, the reliability of the model is reduced due to the loss of part of the data. Some of 
these use lists store a set of data in the main memory. These methods assign a list to each attribute in the 
dataset. The problem is that some of these lists require more space than the one required to store the whole 
training set. 

Another problem with tree-building algorithms is dealing with numerical features. The most 
common method is to partition each feature into two or more intervals using all values of the attribute 
through multiple cutting points. Therefore, the numerical features are divided into intervals and the discrete 
intervals behave like categorical values [16]. The discretization results are the generation of crisp intervals so 
that a feature value either belongs to an interval or not. One of the important disadvantages of the sharp 
cutting point is the sensitivity of the decision tree to noisy data. A solution to this problem is the use of soft 
discretization based on fuzzy theory [17]. The literature proposes some techniques to select partition 
attributes [18], [19], however, these techniques do not intend to deal with large data sets, because some of 
them must evaluate a large number of candidate partitions to select the best attributes, others use 
discretization methods to process numerical attributes, and some use expensive techniques to expand nodes. 

To deal with the mentioned problems in the decision tree, the present study aims to present an 
incremental algorithm based on fuzzy partitioning. By entering data into the tree incrementally, there is no 
need to store the entire dataset in the main memory. The partition strategy involves dividing the training set 
based on all possible attribute values of the discrete attributes, resulting in a partition for each possible value 
of the selected attribute. For continuous attributes, we need a discretization step. In this paper, our focus is on 
a discretization of continuous values based on a fuzzy approach. With local discretization on a dataset of each 
node, the numerical value is converted to fuzzy values. Besides, in each node, the attribute with the highest 
probability prediction is selected to create a new branch. By building the decision tree using the entire 
training dataset without the need to store the entire data in memory and eliminate the used records after the 
development of each node, the memory loss is prevented, and the reliability of the model is increased. 


2. THE COMPREHENSIVE THE ORETICAL BASIS 

When there are continuous and nominal attributes within the data set, most rule induction techniques 
discretize the continual attributes into intervals and treat the discretized intervals like to the par value within 
the induction process [20]. The aim of attribute discretization is to search out concise data representations as 
categories that are sufficient for the training task to retain the maximum amount information as possible 
within the original continuous attributes. The foremost common method is to partition each feature into two 
or more intervals using all values of the attribute through multiple cutting points [16]. There will be a DA 
discretization scheme in the continuous attribute A, which divides this attribute into k discrete and disjoint 
intervals {[do,d,],[d,, d2], ...,[dx_—1,d,]}, where dọ and d, are the minimum and maximum values 
respectively, and P, = {d,,d>,...,d,_1} represents the set of cut points of A, arranged in ascending order. 
After evaluating cut points, intervals of continuous values are splitting according to some defined criterion. 
In the clear case, the discretization will result in a crisp range, and the attribute value either belongs to a 
certain range or does not belong to a certain range. Therefore, the discrete interval should be interpreted 
loosely. The intuitive way to obtain the fuzzy interval for each continuous attribute is to discretize its domain 
into several clear intervals [21]. The most important issue in fuzzy logic is the definition of number, type of 
parameters, and membership function. We use triangular fuzzy partitions, as shown in Figure 1. 

In a crisp decision tree, the split point is chosen as the midpoint of the continual attribute values 
where the knowledge within the class changes. As long as the division attribute with a specific value is 
selected, there is no guarantee that this is the exact value that should be divided. There is always some 
fuzziness when choosing the value of the split point. 

Assume that X is a global set of x variables; the fuzzy set of S on X is defined by membership 
functions as x(x): X — [0,1], which indicates the membership degree of x to S fuzzy set [22]. i) The set of 


attributes of a dataset is denoted by X = {Xj, ..., Xk, Y}, where X; is the i attribute, K is the number of input 


Scalable decision tree based on fuzzy partitioning and an incremental approach (Somayeh Lotfi) 


4230 O ISSN: 2088-8708 


attributes, Y be the output of a fuzzy classification model; ii) Let Pr = {Agi, ..., Agj, ..., Acre } be a partition 
of Xr consisting of Tr fuzzy sets Arj; iii) The output Y is a categorical variable assuming values in the set Y = 
{Ci, ..., Cx} of K possible classes Cx, iv) The examples in the fuzzy dataset S are denoted by 
S = {((X,, Ms (X), (X3, Ms (X:))} (X, Ms (X,,))}, where Xj is the i example, Hs (X;,) is the membership 


degree of X; in S, and n is the number of examples, and v) {F?, F,..., F®’} shows the fuzzy terms defined 


on the i attribute, where F” is the j zzy term defined on attribute Xj. 
the i" attribute, where F’ is the j" fuzzy term defined on attribute X 


Fuzzy decision trees (FDT) combine decision trees with approximate reasoning offered by fuzzy 
representation to cater to uncertainties. FDT use fuzzy linguistic terms to specify the branching condition of 
nodes and permit examples to simultaneously follow down multiple branches with different satisfaction 
degrees ranged on [0, 1]. Each edge of FDT is annotated with a condition, and each leaf is annotated with a 
fuzzy set. During this paper, we exploit an FDT based on fuzzy information gain [16]. First, each attribute is 
partitioned by using strong and uniform triangular fuzzy partitions. The recursive procedure for building the 
tree uses the Fuzzy Information Gain for the identification of the best splitting attribute [23] as (1) to (3): 


Fuzzy Information Gain (S,A) = Entropy,(s) — die. (=) Entropy,(Sy) (1) 
Entropyy(s) = — La Das log Ej- aa (2) 
f te r Mik J-1 yh DE Mik 


ISv| Zirk 

ISol Sika (3) 

|S] Xk=1 2 j=1 Hej 
where N represents the number of samples, 4, is the membership degree of special value for i" feature, and 
Cı is the number of fuzzy sets for the attribute in question. 


HY) 
1 
0.5 
0 
a a, a, a, a, y 
Figure 1. The triangular fuzzy partition 
3. METHOD 


This paper proposes an algorithm called IFDT for the development a decision tree supported the 
idea of incremental construction, while the training data in every node of the tree is processed incremental 
way. IFDT is that the fuzzy version of the IDT algorithm, that the continuous values are converted to fuzzy 
values, the attribute with the best classification ability is chosen for branching using evaluation criteria, and 
also the children are branched from the attribute. The IFDT algorithm is split into four stages and that they 
are described within the following subsections. 


3.1. First stage 

The first stage creates the root node of the decision tree, and the training object defines the input of 
this node, one by one, until the given conditions are met. N is the maximum number of objects that the root 
node can accept at any time. After reaching this value, the node needs to be developed. To develop the node, 
a feature with homogeneous partitioning ability on data should be chosen. For this purpose, training records 
are converted to fuzzy values using defined membership functions. 


3.2. Second stage 

In this stage, we want to find the fuzzy sets for j" quantitative attribute. The range of considered 
attributes is from min to max. The set of {a4j, a, ..., akj} shows the median fuzzy points of je attribute. The 
provided methods to generate the membership functions act independently of sample distribution or based on 
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the distribution of samples. In the present study, the membership function is automatically generated using 
the presented method in [24]. The considered membership function is calculated as follow. First, the initial 
cut-off points are generated using crisp discretization. The points can be used to create a set of distinct 
intervals that are described using the mean of the crisp membership functions. If the value of the attribute is 
within the relevant interval, the value of 1 is assigned to the membership function and otherwise, the 
membership values are equal to 0. In the described overlapping intervals by fuzzy membership functions, a 
point near the cut-off point is assigned to two fuzzy sets with the membership degrees of less than 1 and 
greater than 0 for both membership functions. The sum of two adjacent membership functions is always one, 
and the points crossing these functions are coordinated with the cut-off point in the interval partition. 
Triangular membership functions can be generated as (4) where v is a value in the continuous feature A, L; 
represents the j assigned fuzzy term to an attribute A. L, (V)is a fuzzy membership function which 


determines the membership degree of value v from the attribute A to the corresponding linguistic expression. 


ee v È Oj 
Ajay, Aj <V < Aja 
mj V)=4 1 eza @) 
Vizi Ajiy<v< a 
S v < dj- 


The values of aj can be calculated by a set of cut-point dx as (5) 


dy — (dk+1 — dk)/2 j=k=1 
=| k — drat K)/ J (5) 


Qj-1 + 2*(dj_1 = aj-1) vj = k, dj-1 > aj-1 


3.3. Third stage 

Then to select the best feature for branching, fuzzy information gain is computed as (3). To choose 
the best split attribute when the leaf needs to be expanded, only the instances stored on the leaf are 
considered. The idea is that if selected as a split attribute and a set of split values (one for each class), it is 
better to rebuild the partition defined by the instance class on the leaf to be extended. For a selected feature 
with numerical values, the edges of the nodes are created in proportion to the number of corresponding fuzzy 
values. Besides, considering the fuzzy set a label is assigned to each edge. In the case of categorical features, 
the edge is created in proportion to the possible values for the selected attribute. The tree traversing is 
performed using a selected variable and edge values. Then, the stored records in the node are deleted. In the 
following, the other instance is incrementally entered into the tree and converted to fuzzy values based on 
membership function. The entered sample surveys the tree edge until reaching the leaves based on 
membership degree of features, which satisfies the branching condition. A new sample with a different 
membership degree is stored in one or more leaves. The inference phase continues until all records of the 
training dataset are survived. The pseudocode of the proposed algorithm is shown in Figure 2. 


IFDT(TS) // Incremental Fuzzy Decision Tree 
Input: TS is the training dataset 
ROOT = Create Node () 
For each I € TS do 
UpdateFDT 
End for 
UpdateFDT (I, NODE) 
FI = FuzzyIns(1) 
AddInstanceToNode (NODE, I) 
Expand the node if the number of instances has reached s 
For each edge Rj € NODE do 
memval[j] = ComputeMembershipValue (FI, NODE.Rj) 
if memvall[j] != 0 then 
Update Node (FI , NODE.Edge) 
End. 
Expand Node (NODE) 
If NODE. Classes > 1 then 
NODE. BestAttr = ChooseBestAttribute () 
for each FuzzyInterval in NODE.BestAttr 
Ri = Create Edge () 
Leafi = Create Node () 
End for 
Delete (NODE. Ins) 
else 
Update Edge (NODE. Ins, NODE. Input. Attr) 
Delete (NODE. Ins) 
NODE.numins = 0 
End If 
End. 


Figure 2. Pseudo-code of the proposed algorithm 
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3.4. Fourth stage 

To traverse the DT, an instance starts from the root and then descends through internal nodes until 
the instance reaches a leaf. To descend to a node, IFDT follows the path of the split attribute that best 
matches the corresponding value of the attribute of the traversal DT instance. It is done by calculating the 
minimum absolute difference between the instance value and the edge value. Like all FDT algorithms, the 
classification process in IFDT consists of traversing DTs with invisible instances until it reaches a table and 
assigning the class label associated with the table to the new instance. 


4. RESULTS AND DISCUSSION 

The general method of performing the tests is as follows: in the preprocessing step, the values of 
numerical features are discrete, and the fuzzy membership function is defined based on discrete intervals. 
Then, the dataset incrementally enters the tree and the numerical attributes in each node are converted to 
fuzzy values using defined membership function in preprocessing step. Next, the algorithm of the 
incremental FDT building is applied to fuzzified data. The datasets used for the experiments are described in 
Table 1. The datasets are characterized by varying amounts of instances, classes and attributes. For each 
dataset, the number of numeric and categorical attributes is specified. 

We used 10-fold cross-validation for each dataset and algorithm. The data are divided in this manner 
so that 90% of the data is used for training and 10% is used for testing at each implementation. This method 
is repeated on each fold of data to validate the results. In all the experiments we evaluated the accuracy rate 
over a testing set. We have compared our algorithm with three algorithms of generalized fuzzy partition 
entropy-based fuzzy ID3 algorithm (GFIDT3) [25], fuzzy multi-way decision trees (FMDT) [26] and fuzzy 
binary decision trees (FBDTs) [26] in term of accuracy and tree complexity. The accuracy measure has been 
chosen based on the accuracy as (6). 


TP+TN 


Accuracy = ———___ (6) 
TP+FP+TN+FN 


The accuracy of the algorithms is shown in Table 2. Another criterion for comparing decision trees 
is the complexity of the tree, expressed in terms of the number of nodes, the tree depth, and the number of 
leaves. The results for the FMDT and FBDT (B=15) algorithms are taken from [26]. The number of whole 
nodes, leaves, and depth of the decision tree resulting from each technique is provided in Table 3 to assess 
the tree's complexity for each dataset. 

As can be seen in the Table 3, the total number of nodes and the number of leaf nodes in the 
proposed algorithm is less than all other algorithms, which indicates a reduction in the complexity of the 
decision tree. But instead of reducing the complexity of the tree, the accuracy has not decreased much. Which 
represents the balance between the complexity and accuracy of the decision tree built into the proposed 
algorithm. 


Table 1. The datasets used in tests Table 2. The accuracy achieved by algorithms 
Dataset Instances Attributes Classes Dataset GFIDT3 IFDT FMDT FBDT 
Poker-Hand 1025010 10 (cat:10) 2 Poker-Hand 67.17 62.55 TIAT 62.47 
ECO-E 4178504 16 (num: 16) 10 ECO-E 97.58 96.67 97.58 97.26 
KDD99_2 4856151 41 (num:26, cat:15) 2 Susy 80.96 79.93 79.63 79.72 
KDD99_5 4898431 41 (num:26, cat:15) 5 KDD99_2 99.99 99.80 99.98 99.99 
Susy 5000000 18 (num: 18) 2 KDD99_5 99.97 99.98 99.97 99.99 


Table 3. Complexities of trees 
Dataset GFIDT3 IFDT FMDT FBDT 
Nodes Leaves Depth Nodes Leaves Depth Nodes Leaves Depth Nodes Leaves Depth 
Poker-Hand 30940 18561 18.60 29400 17340 18.20 30940 28561 4 44297 22149 14.75 


ECO-E 16264 8448 20.73 15980 8005 19.50 222694 200048 2.73 17532 9370 24.23 
Susy 18076 9754 30.46 18090 9650 29.80 805076 758064 3.46 21452 10723 14.62 
KDD99_2 151 91 8.54 138 87 8.10 703 630 2.54 222 112 10.18 


KDD99_5 654 302 10.65 609 286 10.04 2716 2351 2.6 719 389 10.65 


5. CONCLUSION 
In this paper, we have presented an algorithm for incremental induction of fuzzy decision trees. 
Proposed algorithm does not need to store the entire training set in memory but processes all instances of the 
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training set. In the non-fuzzy algorithm, to select the best attribute for the branching in continuous attributes, 
the calculation should be conducted on each of the values; therefore, the decision tree construction time is 
higher than a fuzzy algorithm. In the proposed algorithm the fuzzy information gain criterion is used to find 
the best attribute of the branch and the accuracy of the tree is high because of building the decision tree from 
the entire dataset. In general, the most important components of the proposed framework are: Achieve a 
balance between tree accuracy and complexity, Solve the memory limitation problem for a large dataset by 
entering data incrementally into the decision tree, increase model reliability by making a decision tree of all 
training data, lack memory and time overload due to the non-use of especially data structure. 

The results of the implementation of incremental decision trees are compared with two non- 
incremental and non-fuzzy methods for large data sets. What can be deduced from the test results is that in 
the incremental method, the tree construction time is shorter because only the data from the same node is 
calculated when the best branch attribute is selected. Since the number of branches generated by each 
numeric attribute in FDT is as large as the number of fuzzy sets defined for that attribute, fewer nodes are 
created in the tree. Of course, it should be noted that FDT requires a preprocessing stage to determine the 
cutoff point and define the fuzzy set of numerical characteristics. In future research, it will be possible to 
evaluate the impact of each fuzzification method on the precision and complexity of the decision tree. 
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